Skip to content

Torch-style tolerances and multi-dispatch validation for accordo#85

Open
mawad-amd wants to merge 4 commits intomainfrom
muhaawad/accordo-restart
Open

Torch-style tolerances and multi-dispatch validation for accordo#85
mawad-amd wants to merge 4 commits intomainfrom
muhaawad/accordo-restart

Conversation

@mawad-amd
Copy link
Member

Summary

  • Add atol, rtol, and equal_nan parameters to compare_snapshots, matching np.allclose/torch.allclose semantics. Legacy tolerance parameter still works as an alias for atol.
  • Capture and validate all kernel dispatches, not just the first. Snapshot gains a dispatch_arrays field; arrays remains the first dispatch's outputs for backward compatibility.
  • Rework IPC protocol to process dispatch records in batches, sending a "done" response after each batch to unblock the next dispatch.

Test plan

  • All 18 accordo tests pass on MI355X (Vultr, ROCm 7.2, Docker)
  • All 13 existing tests pass on MI210 (hpcfund, ROCm 7.1, bare metal)
  • CI (MI325X, Apptainer)

New tests added:

  • test_compare_snapshots_supports_rtol — rtol proportional tolerance
  • test_compare_snapshots_equal_nan_toggle — NaN equality toggle
  • test_compare_snapshots_tolerance_backward_compatibility — legacy tolerance param
  • test_multi_dispatch_second_dispatch_mismatch_detected — catches mismatch in 2nd dispatch
  • test_multi_dispatch_count_mismatch_fails — dispatch count mismatch detection

🤖 Generated with Claude Code

mawad-amd and others added 3 commits March 19, 2026 22:45
…ccordo.

This preserves legacy tolerance compatibility while comparing all dispatches with dispatch-aware mismatch reporting and expanded regression coverage.

Made-with: Cursor
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 23, 2026 00:18
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR extends Accordo’s validation to (1) support torch.allclose/np.allclose-style tolerances (atol, rtol, equal_nan) and (2) correctly capture/validate multiple kernel dispatches rather than only the first dispatch, with IPC updated to acknowledge each dispatch batch.

Changes:

  • Add atol, rtol, and equal_nan options to snapshot comparison while keeping legacy tolerance as an alias for absolute tolerance.
  • Capture outputs for every dispatch (Snapshot.dispatch_arrays) and validate dispatch-by-dispatch, including dispatch-count mismatch detection.
  • Update IPC reader logic to process output-handle records in dispatch-sized batches and respond “done” per batch to unblock the next dispatch.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
accordo/accordo/validator.py Adds allclose-style comparison knobs and validates per-dispatch output arrays.
accordo/accordo/_internal/ipc/communication.py Switches IPC ingestion to record-batching per dispatch and sends “done” per batch.
accordo/accordo/snapshot.py Adds dispatch_arrays to represent multi-dispatch captures.
accordo/accordo/result.py Extends mismatch reporting to include dispatch_index.
accordo/accordo/mcp/server.py Plumbs new tolerance parameters through MCP validation entrypoints.
accordo/README.md Updates docs/examples to use atol/rtol/equal_nan and documents dispatch_arrays.
accordo/accordo/init.py Updates package-level examples to show new comparison parameters.
accordo/tests/test_reduction_validation.py Adds tests for rtol/equal_nan/back-compat tolerance and multi-dispatch validation behavior.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…trics

Addresses Copilot review feedback:
- Snapshot.arrays docstring now states it contains first dispatch only
- Mismatch metrics filter out NaN values so max_difference/mean_difference
  remain informative when equal_nan=False

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants